# Low-Latency Inference

Neurobert Mini
MIT
NeuroBERT-Mini is a lightweight natural language processing model derived from google/bert-base-uncased, optimized for real-time inference on edge and IoT devices.
Large Language Model Transformers
N
boltuix
212
10
Vaani
Apache-2.0
A multilingual audio classification model based on speechbrain/lang-id-commonlanguage_ecapa, supporting identification of 5 Indian languages
Audio Classification Supports Multiple Languages
V
panchajanya-ai
25
2
Japanese Reranker Tiny V2
MIT
This is a very compact and fast Japanese reranking model, suitable for improving the accuracy of RAG systems and can run efficiently on CPUs or edge devices.
Text Embedding Japanese
J
hotchpotch
339
3
Japanese Reranker Xsmall V2
MIT
This is a very compact and fast Japanese reranking model, suitable for improving the accuracy of RAG systems.
Text Embedding Japanese
J
hotchpotch
260
1
Qwen2.5 VL 72B Instruct FP8 Dynamic
Apache-2.0
FP8 quantized version of Qwen2.5-VL-72B-Instruct, supporting vision-text input and text output, optimized and released by Neural Magic.
Image-to-Text Transformers English
Q
parasail-ai
78
1
Gemma 3 4b It Int8 Asym Ov
Apache-2.0
Gemma 3 4B parameter model optimized with OpenVINO, supporting text-to-text and visual-text inference
Image-to-Text
G
Echo9Zulu
152
1
Faster Distil Whisper Large V3.5
MIT
Distil-Whisper is a distilled version of the Whisper model, optimized for Automatic Speech Recognition (ASR) tasks, offering faster inference speeds.
Speech Recognition English
F
Purfview
565
2
Faster Distil Whisper Large V3.5
MIT
A CTranslate2 format model converted from Distil-Whisper large-v3.5 for efficient speech recognition
Speech Recognition English
F
deepdml
58.15k
2
RWKV7 Goose World3 2.9B HF
Apache-2.0
The RWKV-7 model adopts the flash linear attention format, supports multilingual text generation tasks, and has a parameter count of 2.9 billion.
Large Language Model Supports Multiple Languages
R
RWKV
132
7
Phi 4 Multimodal Instruct
MIT
Phi-4-multimodal-instruct is a lightweight open-source multimodal foundation model that supports text, image, and audio inputs to generate text outputs, with a context length of 128K tokens.
Multimodal Fusion Transformers Supports Multiple Languages
P
mjtechguy
18
0
Pixtral 12b Quantized.w8a8
Apache-2.0
INT8 quantized version based on mgoin/pixtral-12b, supports vision-text multimodal tasks with optimized inference efficiency
Image-to-Text Transformers English
P
RedHatAI
309
1
Qwen2.5 VL 7B Instruct Quantized.w8a8
Apache-2.0
Quantized version of Qwen2.5-VL-7B-Instruct, supporting vision-text input and text output, optimized for inference efficiency through INT8 weight quantization
Image-to-Text Transformers English
Q
RedHatAI
1,992
3
Lb Reranker 0.5B V1.0
Apache-2.0
The LB Reranker is a model for determining the relevance between queries and text snippets, supporting 95+ languages, suitable for ranking and reranking in retrieval tasks.
Large Language Model Transformers Supports Multiple Languages
L
lightblue
917
66
Kotoba Whisper Bilingual V1.0
Apache-2.0
Kotoba-Whisper-Bilingual is a distilled model collection trained from the Whisper model, specifically designed for Japanese and English speech recognition and speech-to-text translation tasks.
Speech Recognition Transformers Supports Multiple Languages
K
kotoba-tech
782
13
Layerskip Llama2 7B
Other
An improved model based on Llama2 7B, supporting hierarchical skip and self-speculative decoding to enhance inference efficiency
Large Language Model Transformers English
L
facebook
1,674
14
Llama 3 Firefunction V2
FireFunction V2 is a state-of-the-art function calling model with a commercially viable license, trained on Llama 3, supporting parallel function calls and strong instruction following.
Large Language Model Transformers
L
fireworks-ai
1,361
145
Llm Compiler 13b
Other
LLM Compiler is an advanced LLM based on Code Llama, specifically designed for code optimization and compiler reasoning tasks
Large Language Model Transformers
L
facebook
107
84
Tinyagent ToolRAG
TinyAgent is a small language model (SLM) designed for edge devices, focusing on function calling and complex reasoning capabilities, providing privacy protection and low-latency services.
Large Language Model Transformers English
T
squeeze-ai-lab
45
16
Hiera Base 224 In1k Hf
Hiera is a hierarchical vision Transformer model that is fast, powerful, and concise. It surpasses state-of-the-art performance in a wide range of image and video tasks while significantly improving runtime speed.
Image Classification Transformers English
H
facebook
188
2
Codegemma 1.1 7b It
CodeGemma is a lightweight open-source code model series based on Gemma, specializing in code generation and dialogue tasks.
Large Language Model Transformers
C
google
209
50
Distil Whisper Large V3 German
Apache-2.0
A German speech recognition model based on distil-whisper technology, with 756 million parameters, achieving faster inference speeds while maintaining high quality.
Speech Recognition Transformers German
D
primeline
207
15
Ragas Critic Llm Qwen1.5 GPTQ
Apache-2.0
The Ragas Evaluation Model is part of the Ragas synthetic test data generation pipeline, serving as an alternative to GPT-4 for evaluation tasks.
Large Language Model Transformers
R
explodinggradients
26
12
Distil Large V3
MIT
Distil-Whisper is a knowledge-distilled version of Whisper large-v3, focusing on English automatic speech recognition, offering faster inference speeds while maintaining accuracy close to the original model.
Speech Recognition English
D
distil-whisper
417.11k
311
Faster Distil Whisper Medium.en
MIT
This is a version of the distil-whisper/distil-medium.en model converted to CTranslate2 format for efficient speech recognition tasks.
Speech Recognition English
F
Systran
6,155
4
Faster Distil Whisper Large V2
MIT
This is a distilled version of the automatic speech recognition (ASR) model based on the Whisper architecture, designed for efficient inference and suitable for English speech-to-text tasks.
Speech Recognition English
F
Systran
1,336
19
Multilingual E5 Small Optimized
MIT
This is the quantized version of multilingual-e5-small, optimized for inference performance through layer-wise quantization while retaining most of the original model's quality.
Text Embedding Supports Multiple Languages
M
elastic
201
15
Xlm Roberta Base Language Detection Onnx
MIT
This is the ONNX format conversion of the papluca/xlm-roberta-base-language-detection model, designed for multilingual text classification tasks, supporting detection in 20 languages.
Text Classification Transformers Supports Multiple Languages
X
protectai
6,535
6
Replit Code V1 5 3b
Apache-2.0
A 3.3B-parameter causal language model specialized in code completion tasks, supporting 30 programming languages
Large Language Model Transformers Other
R
replit
1,773
295
Bge Large En V1.5 Quant
MIT
Quantized (INT8) ONNX variant of BGE-large-en-v1.5 with inference acceleration via DeepSparse
Text Embedding Transformers English
B
RedHatAI
1,094
22
MIT Ast Finetuned Speech Commands V2 Ov
This is an OpenVINO-optimized version converted from MIT/ast-finetuned-speech-commands-v2, designed to accelerate inference operations for voice command recognition tasks.
Audio Classification Transformers English
M
helenai
514
0
Efficientformer L3 300
Apache-2.0
EfficientFormer-L3 is a lightweight vision Transformer model developed by Snap Research, optimized for mobile devices to achieve low latency while maintaining high performance.
Image Classification English
E
snap-research
279
2
Mobilenet V1 0.75 192
Other
MobileNet V1 is a lightweight convolutional neural network designed for mobile devices, balancing latency, model size, and accuracy in image classification tasks.
Image Classification Transformers
M
google
31.54k
2
Mobilenet V1 1.0 224
Other
MobileNet V1 is a lightweight convolutional neural network designed for mobile and embedded vision applications, pre-trained on the ImageNet-1k dataset.
Image Classification Transformers
M
google
5,344
1
Mobilenet V2 1.0 224
Other
MobileNet V2 is a lightweight vision model optimized for mobile devices, excelling in image classification tasks.
Image Classification Transformers
M
google
69.47k
29
Mobilenet V2 1.4 224
Other
A lightweight image classification model pre-trained on the ImageNet-1k dataset, specifically optimized for mobile devices
Image Classification Transformers
M
google
737
1
T5 Small Openvino
Apache-2.0
OpenVINO IR format version of the T5-small model, supporting text generation, translation, and other tasks
Large Language Model Transformers Supports Multiple Languages
T
echarlaix
3,749
4
Mobilenet V2 1.4 224
Other
MobileNet V2 is a lightweight convolutional neural network designed for mobile devices, excelling in image classification tasks.
Image Classification Transformers
M
Matthijs
26
0
Mobilenet V2 1.0 224
Other
MobileNet V2 is a lightweight convolutional neural network designed for mobile devices, excelling in image classification tasks.
Image Classification Transformers
M
Matthijs
29
0
Mobilenet V1 1.0 224
Other
MobileNet V1 is a lightweight convolutional neural network designed for mobile and embedded vision applications, pretrained on the ImageNet-1k dataset.
Image Classification Transformers
M
Matthijs
41
0
Ms Marco MiniLM L2 V2
Apache-2.0
A cross-encoder model trained on the MS Marco passage ranking task for query-passage relevance scoring in information retrieval.
Text Embedding English
M
cross-encoder
533.42k
11
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase